[C++] Drastically improve multi-threaded performance #3550

jcking · 2022-02-22T20:17:59Z

The original implementation uses two static std::shared_mutex that are shared among all ParserATNSimulator and LexerATNSimulator. This causes all generated lexers and parsers to share the same global locks. This is unnecessary as all data is rooted in antlr4::atn::ATN, meaning all data that needs synchronized access belongs to exactly one antlr4::atn::ATN. With this in mind, we can shift the locks from static to per antlr4::atn::ATN. This drastically improves performance as the number of threads go up, in some cases as much as 40%.

jcking · 2022-02-22T20:23:05Z

From CEL's parsing benchmarks using various thread configurations:

name                 old cpu/op  new cpu/op  delta
BM_Parse/threads:1   3.27ms ± 2%  3.33ms ± 6%   +1.86%  (p=0.000 n=49+44)
BM_Parse/threads:2   4.32ms ± 5%  4.00ms ± 5%   -7.37%  (p=0.000 n=51+54)
BM_Parse/threads:4   7.19ms ± 3%  5.39ms ± 5%  -25.07%  (p=0.000 n=49+51)
BM_Parse/threads:8   15.2ms ± 4%   9.6ms ± 6%  -36.50%  (p=0.000 n=33+55)
BM_Parse/threads:12  26.9ms ± 1%  16.5ms ± 4%  -38.88%  (p=0.000 n=13+18)

name                 old time/op             new time/op             delta
BM_Parse/threads:1   3.27ms ± 2%             3.33ms ± 6%   +1.86%        (p=0.000 n=49+44)
BM_Parse/threads:2   2.18ms ± 5%             2.02ms ± 6%   -7.29%        (p=0.000 n=51+56)
BM_Parse/threads:4   1.88ms ± 4%             1.39ms ± 6%  -26.32%        (p=0.000 n=45+48)
BM_Parse/threads:8   2.30ms ± 3%             1.47ms ± 6%  -36.14%        (p=0.000 n=31+56)
BM_Parse/threads:12  2.73ms ± 1%             1.62ms ± 2%  -40.75%        (p=0.000 n=13+20)

name                 old INSTRUCTIONS/op     new INSTRUCTIONS/op     delta
BM_Parse/threads:1    24.7M ± 0%              24.2M ± 0%   -1.99%        (p=0.000 n=50+47)
BM_Parse/threads:2    24.7M ± 0%              24.2M ± 0%   -2.02%        (p=0.000 n=54+57)
BM_Parse/threads:4    24.7M ± 0%              24.2M ± 0%   -2.10%        (p=0.000 n=50+52)
BM_Parse/threads:8    24.8M ± 0%              24.1M ± 0%   -2.45%        (p=0.000 n=31+57)
BM_Parse/threads:12   24.8M ± 0%              24.1M ± 0%   -2.67%        (p=0.000 n=12+19)

name                 old CYCLES/op           new CYCLES/op           delta
BM_Parse/threads:1    12.3M ± 2%              12.5M ± 6%   +1.75%        (p=0.000 n=49+42)
BM_Parse/threads:2    14.1M ± 2%              13.7M ± 5%   -2.71%        (p=0.000 n=51+47)
BM_Parse/threads:4    17.9M ± 2%              15.9M ± 8%  -11.64%        (p=0.000 n=47+51)
BM_Parse/threads:8    26.4M ± 5%              22.5M ± 8%  -14.73%        (p=0.000 n=31+57)
BM_Parse/threads:12   35.5M ± 2%              30.0M ± 3%  -15.42%        (p=0.000 n=13+20)

name                 old allocs/op           new allocs/op           delta
BM_Parse/threads:1    33.5k ± 0%              33.5k ± 0%     ~     (all samples are equal)
BM_Parse/threads:2    33.5k ± 0%              33.5k ± 0%     ~     (all samples are equal)
BM_Parse/threads:4    33.5k ± 0%              33.5k ± 0%     ~     (all samples are equal)
BM_Parse/threads:8    33.5k ± 0%              33.5k ± 0%     ~     (all samples are equal)
BM_Parse/threads:12   33.5k ± 0%              33.5k ± 0%     ~     (all samples are equal)

name                 old peak-mem(Bytes)/op  new peak-mem(Bytes)/op  delta
BM_Parse/threads:1    40.4k ± 0%              40.4k ± 0%     ~     (all samples are equal)
BM_Parse/threads:2    40.4k ± 0%              40.4k ± 0%     ~     (all samples are equal)
BM_Parse/threads:4    40.4k ± 0%              40.4k ± 0%     ~     (all samples are equal)
BM_Parse/threads:8    40.4k ± 0%              40.4k ± 0%     ~     (all samples are equal)
BM_Parse/threads:12   40.4k ± 0%              40.4k ± 0%     ~     (all samples are equal)

jcking · 2022-02-22T20:25:17Z

@mike-lischke

parrt · 2022-02-24T18:22:34Z

If this PR proves/looks useful, @mike-lischke , I'd like to wait to incorporate this into 4.10.

mike-lischke · 2022-02-25T08:24:02Z

@parrt How much time do we have to review this patch?

I'm hesitant to approve this patch, first, because it is pretty large again, second, it goes to the heart of the runtime access synchronisation and third, I'm not really sure it's correct.

@jcking The 2 mutexes you changed were static, right, but that is necessary because the ATN is shared between all parser instances of the same grammar type. I agree that there's some penalty if you use multiple different parser types at the same time, multithreaded (so, it would make some sense to make these mutexes instance variables). But how often does this really happen? Even if you need to parse different languages, you rarely do this all in parallel.

Your numbers are impressive (at least for time/op, while cycles/op is significantly lower), but I wonder how you took them. Additionally, I don't know what CEL is and what all those times/op, cycles/op etc. mean. What is the op here? One parse run? Did you test with multiple different grammars? How did you run the tests? Instantiate 12 parsers in 12 threads, all for the same grammar, parsing all the same input?

I have not had the time to go fully through the patch, but what certainly jumps out is the removal of the locks around the ATN access, which I think is wrong. Access is no longer serialized there. Can you explain your intention there?

jcking · 2022-02-25T16:02:29Z

@jcking The 2 mutexes you changed were static, right, but that is necessary because the ATN is shared between all parser instances of the same grammar type. I agree that there's some penalty if you use multiple different parser types at the same time, multithreaded (so, it would make some sense to make these mutexes instance variables). But how often does this really happen? Even if you need to parse different languages, you rarely do this all in parallel.

The mutexes were static and shared between all parser AND lexer instances of ALL grammar types. The scenario is very common. Think of API filtering expressions that you need to parse. You may be handling thousands simultaneously. Currently throughput is heavily restricted. At scale, handling different languages simultaneously also happens. When you are dealing in the millions of files you need to parse of different languages, doing that serially is infeasible.

Your numbers are impressive (at least for time/op, while cycles/op is significantly lower), but I wonder how you took them. Additionally, I don't know what CEL is and what all those times/op, cycles/op etc. mean. What is the op here? One parse run? Did you test with multiple different grammars? How did you run the tests? Instantiate 12 parsers in 12 threads, all for the same grammar, parsing all the same input?

Its based off of https://github.com/google/cel-cpp/blob/master/parser/parser_test.cc. Each iteration is over all the expressions. Multiple thread configurations are used. BM_Parse is the benchmark using google/benchmark. It runs the benchmark function simultaneously in multiple threads. The benchmark was run as the current head of the dev branch for ANTLRv4 vs the current head of the dev branch for ANTLRv4 plus this patch. https://pkg.go.dev/golang.org/x/perf/cmd/benchstat is the summarizer.

I have not had the time to go fully through the patch, but what certainly jumps out is the removal of the locks around the ATN access, which I think is wrong. Access is no longer serialized there. Can you explain your intention there?

No locking was removed. The locks were simply shifted from static to being per ATN. Since all DFAState and ATNState are owned by exactly one ATN, access only needs to be serialized on a per ATN basis. The other data is constant after creation or being added to DFA, DFAState, or ATNState.

I also ran the same benchmark with TSAN enabled and there was no error. I am happy to test other grammars.

parrt · 2022-02-25T17:21:38Z

@mike-lischke we have time. As you say let's proceed with caution. ATN stuff is scary to change. Running off to teach but around later today and weekend.

mike-lischke

After taking a long look at the patch I believe it's a completely fine one. My original impression that locks were removed was based on something I saw while roughly scanning the patch for obvious mistakes. It turned out that only one lock was removed (actually moved up to the caller of that function), so this is all correct.

The only small issue I still have is that comment I made about a large string used for lookup.

runtime/Cpp/runtime/src/Parser.cpp

mike-lischke · 2022-02-26T11:22:42Z

This drastically improves performance as the number of threads go up, in some cases as much as 40%.

However, there's a slight speed decrease for single threaded use. Not a big deal, but I wonder where this comes from.

parrt · 2022-02-26T18:48:36Z

Hi Guys, I decided to look at the locking mechanism in Java and, fortunately, I was good to my future self and wrote a comment which I will paste here for easier retrieval by others. ALL(*) is a super complicated algorithm and I'm surprised I could hold it in my head earlier all at once. haha.

Thread safety in ATN/DFA construction

The ParserATNSimulator locks on the decisionToDFA field when it adds a new DFA object to that array. addDFAEdge() locks on the DFA for the current decision when setting the DFAState.edges field. addDFAState() locks on the DFA for the current decision when looking up a DFA state to see if it already exists. We must make sure that all requests to add DFA states that are equivalent result in the same shared DFA object. This is because lots of threads will be trying to update the DFA at once. The addDFAState() method also locks inside the DFA lock but this time on the shared context cache when it rebuilds the configurations' PredictionContext objects using cached subgraphs/nodes. No other locking occurs, even during DFA simulation. This is safe as long as we can guarantee that all threads referencing s.edge[t] get the same physical target DFAState, or null. Once into the DFA, the DFA simulation does not reference the DFA.states map. It follows the DFAState.edges field to new targets. The DFA simulator will either find DFAState#edges to be null, to be non-null and dfa.edges[t] null, or dfa.edges[t] to be non-null. The addDFAEdge() method could be racing to set the field but in either case the DFA simulator works; if null, and requests ATN simulation. It could also race trying to get dfa.edges[t], but either way it will work because it's not doing a test and set operation.

parrt · 2022-02-26T18:49:07Z

I'll leave this for further discussion until the coming work week.

mike-lischke · 2022-02-27T18:27:20Z

Thanks for the explanation @parrt. The task for the C++ target was to find a replacement for Java's synchronized block. There's no C++ equivalent for per-object locks. So we used the second best option: serialize access to those code paths, which use the object which is used in the Java synchronized block. This required some careful checks, but turned out to be pretty easy, as there are only very few code paths that need protection. As you know in C++ everything is a bit more explicit, so we had to define exactly where shared read or locked write access has to happen.

In the current solution the C++ target uses two mutexes, one for DFA state manipulation and the other for edge access control. These mutexes are defined as static members of the ATNSimulator class and hence there are only 2 mutexes in an application that uses any of the simulator classes. This in turn means all access to ATN and DFA goes through these 2 locks, even if there is actually no concurrent access (for example when using two different ATNs for different languages).

Justin now has changed it so that these mutexes became members of the ATN class (not the simulators, as we need to protect the ATN and DFA, not the simulators). So these locks became a per-instance thing and there's no interference between parsers working with different grammars (and hence ATNs). This improves performance for such scenarios.

~~What's not fully clear to me is where the speed increase in the tested scenarios comes from. I assume Justin only tested parsers all for the same grammar.~~ However the code is constantly checked with TSAN (the thread sanitizer), which will point out (potentially) dangerous code paths in multi-threaded environments. And this patch brought up no warnings, it seems. So, I had no reason not to approve it.

Right after posting the comment I realized what the reason for the speed increase is, even for a single grammar type: the same mutexes are used by both the lexer ATN simulator and the parser ATN simulator. And can so hinder each other. With the per ATN instance locks this is a thing of the past. Both simulators have now own locks, as they don't share the ATN.

@OverRide

* Get rid of reflection in CodeGenerator * Rename TargetType -> Language * Remove TargetType enum, use String instead as it was before Create CodeGenerator only one time during grammar processing, refactor code * Add default branch to appendEscapedCodePoint for unofficial targets (Kotlin) * Remove getVersion() overrides from Targets since they return the same value * Remove getLanguage() overrides from Targets since common implementation returns correct value * [again] don't use "quiet" option for mvn tests...hard to figure out what's wrong when failed. * normalize targets to 80 char strings for ATN serialization, except Java which needs big strings for efficiency. * Update actions.md fixed a small typo * Rename `CodeGenerator.createCodeGenerator` to `CodeGenerator.create` * Replace constants on string literals in `appendEscapedCodePoint` * Restore API of Target getLanguage(): protected -> public as it was before appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb, boolean escape): protected -> private (it's a new helper method, no need for API now) Added comment for appendUnicodeEscapedCodePoint * Introduce caseInsensitive lexer rule option, fixes #3436 * don't ahead of time compile for DART. See 8ca8804#commitcomment-62642779 * Simplify test rig related to timeouts (#3445) * remove all -q quiet mvn options to see output on CI servers. * run the various unit test classes in parallel rather than each individual test method, all except for Swift at the moment: `-Dparallel=classes -DthreadCount=4` * use bigger machine at circleci * No more test groups like parser1, parser2. * simplify Swift like the other tests * fix whitespace issues * use 4.10 not 4.9.4 * improve releasing antlr doc * Add Support For Swift Package Manager (#3132) * Add Swift Package Manager Support * Swift Package Dynamic * 【fix】【test】Fix run process path Co-authored-by: Terence Parr <parrt@cs.usfca.edu> * use src 11 for tool, but 8 for plugin/runtime (#3450) * use src 11 for tool, but 8 for plugin/runtime/runtime-tests. * use 11 in CI builds * cpp/cmake: Fix library install directories (#3447) This installs DLLs in bin directory instead of lib. * Python local import fixes (#3232) * Fixed pygrun relative import issue * Added name to contributors.txt Co-authored-by: Terence Parr <parrt@cs.usfca.edu> * Update javadoc to 8 and 11 (#3454) * no need for plugin in runtime, always gen svg from dot for javadoc, gen 1.8 not 1.7 doc for runtime. Gen 11 for tool. * tweak doc for 1.8 runtime. Test rig should gen 1.8 not 1.7 * [Go] Fix (*BitSet).equals (#3455) * set tool version for testing * oops reversion tool version as it's not sync'd with runtime and not time to release yet. * Remove unused variable from generated code (#3459) * [C++] Fix bugs in UnbufferedCharStream (#3420) * Escape bad words during grammar generation (#3451) * Escape reserved words during grammar generation, fixes #1070 (for -> for_ but RULE_for) Deprecate USE_OF_BAD_WORD * Make name and escapedName consistent across tool and codegen classes Fix other pull request notes * Rename NamedActionChunk to SymbolRefChunk * try out windows runners * rename workflow * Update windows.yml Fix cmd line issue * fix maven issue on windows * use jdk 11 * remove arch arg * display Github status for windows * try testing python3 on windows * try new run for python3 windows * try new run for python3 windows (again) * try new run for python3 windows (again2) * try new run for python3 windows (again3) * try new run for python3 windows (again4) * try new run for python3 windows (again5) * try new run for python3 windows * try new run for python3 windows * try new run for python3 windows * ugh i give up. python won't install on github actions. * Update windows.yml try python 3 * Update windows.yml * Update run-tests-python3.cmd * Update run-tests-python3.cmd * Create run-tests-python2.cmd * Update windows.yml * Update run-tests-python2.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-javascript.cmd * Update run-tests-javascript.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-csharp.cmd * Update windows.yml * fix warnings in C# CI * Update windows.yml * Update windows.yml * Create run-tests-dart.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-go.cmd * Update windows.yml * Update windows.yml * Update windows.yml * GitHub action php (#3474) * Update windows.yml * Create run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-php.cmd * Update windows.yml * Cleanup ci (#3476) * Delete .appveyor directory * Delete .travis directory * Improve CI concurrency (#3477) * Update windows.yml * Update windows.yml * Update windows.yml * Optimize toArray replace toArray(new T[size]) with toArray(new T[0]) for better performance https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_conclusion * add contributor * resolve conflicts * fix-maven-concurrency (#3479) * fix-maven-concurrency * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-python2.cmd * Update run-tests-python3.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-php.cmd * Update windows.yml * Update run-tests-dart.cmd * Update run-tests-csharp.cmd * Update run-tests-go.cmd * Update run-tests-java.cmd * Update run-tests-javascript.cmd * Update run-tests-php.cmd * Update run-tests-python2.cmd * Update run-tests-python3.cmd * increase Windows CI concurrency for all targets except Dart * Preserve line separators for input runtime tests data (#3483) * Preserve line separators for input data in runtime tests, fix test data Refactor and improve performance of BaseRuntimeTest * Add LineSeparator (\n, \r\n) tests * Set up .gitattributes for LineSeparator_LF.txt (eol=lf) and LineSeparator_CRLF.txt (eol=crlf) * Restore `\n` for all input in runtime tests, add extra LexerExec tests (LineSeparatorLf, LineSeparatorCrLf) * Add generated LargeLexer test, remove LargeLexer.txt descriptor * tweak name to be GeneratedLexerDescriptors * [JavaScript] Migrate from jest to jasmine * [C++] Fix Windows min/max macro collision * [C++] Update cmake README.md to C++17 * remove unnecessary comparisons. * Add useful function writeSerializedATNIntegerHistogram for writing out information concerning how many of each integer value appear in a serialized ATN. * fix comment indicating what goes in the serialized ATN. * move writeSerializedATNIntegerHistogram out of runtime. * follow guidelines * Fix .interp file parsing test for the Java runtime. Also includes separating the generation of the .interp file from writing it out so that we can use both independently. * Delete files no longer needed. Should have been part of #3520 * [C++] Optimizations and cleanups and const correctness, oh my * [C++] Optimize LL1Analyzer * [C++] Fix missing virtual destructors * Remove not used PROTECTED, PUBLIC, PRIVATE tokens from ANTLRLexer.g * Remove ANTLR 3 stuff from ANTLR grammars, deprecate ANTLR 3 errors * Remove not used imaginary tokens from ANTLRParser.g * Fix misprints in grammars * ATN serialized data: remove shifting by 2, remove UUID; fix #3515 Regenerate XPathLexer files * Disable native runtime tests (see #3521) * Implement Java-specific ATN data optimization (+-2 shift) * [C++] Remove now unused antlrcpp::Guid * pull new branch diagram from master * use dev not master branch for CI github * update doc from master * add back missing author * [C++] Fix const correctness in ATN and DFA * keep getSerializedATNSegmentLimit at max int * Fixes #3259 make InErrorRecoveryMode public for go * Change code gen template to capitalize InErrorRecoveryMode * [C++] Improve multithreaded performance, fix TSAN error, and fix profiling ATN simulator setup bug * Get rid of unnecessary allocations and calculations in SerializedATN * Get rid of excess char escaping in generated files, decrease size of output files Fix creation of excess fragments for Dart, Cpp, PHP runtimes * Swift: fix binary serialization and use instead of JSON * Fix targetCharValueEscape, make them final and static * [C++] Cleanup ATNDeserializer and remove related deprecated methods from ATNSimulator * Fix for #3557 (getting "go test" to work again). * Convert Python2/3 to use int arrays not strings for ATN encodings (#3561) * Convert Python2/3 to use int arrays not strings for ATN encodings. Also make target indicate int vs string. * rename and reverse ATNSerializedAsInts * add override * remove unneeded method * [C++] Drastically improve multi-threaded performance (#3550) Thanks guys. A major advancement. * [C++] Remove duplicate includes and remove unused includes (#3563) * [C++] Lazily deserialize ATN in generated code (#3562) * [Docs] Update Swift Docs (#3458) * Add Swift Package Manager Support * Swift Package Dynamic * 【fix】【test】Fix run process path * [Docs] [Swift] update link, remove expired descriptions Co-authored-by: Terence Parr <parrt@cs.usfca.edu> * Ascii only ATN serialization (#3566) * go back to generating pure ascii ATN serializations to avoid issues where target compilers might assume ascii vs utf-8. * forgot I had to change php on previous ATN serialization tweak. * change how we escapeChar() per target. * oops; gotta use escapeChar method * rm unneeded case * add @OverRide * use ints not chars for C# (#3567) * use ints not chars for C# * oops. remove 'quotes' * regen from XPathLexer.g4 * simplify ATN with bypass alts mechanism in Java. * Change string to int[] for serialized ATN for C#; removed unneeded `use System` from XPathLexer.g4; regen that grammar. * [C++] Use camel case name in generated lexers and parsers (#3565) * Change string to int array for serialized ATN for JavaScript (#3568) * perf: Add default implementation for Visit in ParseTreeVisitor. (#3569) * perf: Add default implementation for Visit in ParseTreeVisitor. Reference: https://github.com/antlr/antlr4/blob/ad29539cd2e94b2599e0281515f6cbb420d29f38/runtime/Java/src/org/antlr/v4/runtime/tree/AbstractParseTreeVisitor.java#L18 * doc: add contributor * Don't use utf decoding...these are just ints (#3573) * [Go] Cleanup and fix ATN deserialization verification (#3574) * [C++] Force generated static data type name to titlecase (#3572) * Use int array not string for ATN in Swift (#3575) * [C++] Fix generated Lexer static data constructor (#3576) * Use int array not string for ATN in Dart (#3578) * Fix PHP codegen to support int ATN serialization (#3579) * Update listener documentation to satisfy the discussion about improving exception handling: #3162 * tweak * [C++] Remove unused LexerATNSimulator::match_calls (#3570) * [C++] Remove unused LexerATNSimulator::match_calls * Remove match_calls from other targets * [Java] Preserve serialized ATN version 3 compatibility (#3583) * add jcking to the contributors list * Update releasing-antlr.md * [C++] Avoid using dynamic_cast where possible by using hand rolled RTTI (#3584) * Revert "[Java] Preserve serialized ATN version 3 compatibility (#3583)" This reverts commit 01bc811. * [C++] Add ANTLR4CPP_PUBLIC attributes to various symbols (#3588) * Update editorconfig for c++ (#3586) * Make it easier to contribute: Add c++ configuration for .editorconfig. Using the observed style with 2 indentation spaces. Signed-off-by: Henner Zeller <hzeller@google.com> * Add hzeller to contributors.txt Signed-off-by: Henner Zeller <hzeller@google.com> * Fix code style and typing to support PHP 8 (#3582) * [Go] Port locking algorithm from C++ to Go (#3571) * Use linux DCO not our old contributors certificate of origin * [C++] Fix bugs in SemanticContext (#3595) * [Go] Do not export Array2DHashSet which is an implementation detail (#3597) * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * Use signed ints for ATN serialization not uint16, except for java (#3591) * refactor serialize so we don't need comments * more cleanup during refactor * store language in serializer obj * A lexer rule token type should never be -1 (EOF). 0 is fragment but then must be > 0. * Go uses int not uint16 for ATN now. java/go/python3 pass * remove checks for 0xFFFF in Go. * C++ uint16_t to int for ATN. * add mac php dir; fix type on accept() for generated code to be mixed. * Add test from @KvanTTT. This PR fixes #3555 for non-Java targets. * cleanup and add big lexer from #3546 * increase mvn mem size to 2G * increase mvn mem size to 8G * turn off the big ATN lexer test as we have memory issues during testing. * Fixes #3592 * Revert "C++ uint16_t to int for ATN." This reverts commit 4d2ebbf. # Conflicts: # runtime/Cpp/runtime/src/atn/ATNSerializer.cpp # runtime/Cpp/runtime/src/tree/xpath/XPathLexer.cpp * C++ uint16_t to int32_t for ATN. * rm unnecessary include file, updating project file. get rid of the 0xFFFF does in the C++ deserialization * rm refs to 0xFFFF in swift * javascript tests were running as Node...added to ignore list. * don't distinguish between 16 and 32 bit char sets in serialization; Python2/3 updated to work with this change. * update C++ to deserialize only 32-bit sets * 0xFFFF -> -1 for C++ target. * get other targets to use 32-bit sets in serialization. tests pass locally. * refactor to reduce code size * add comment * oops. comment out call to writeSerializedATNIntegerHistogram(). I wonder if this is why it ran out of memory during testing? * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * Turn off this big lexer because we get memory errors during continuous integration * Intermediate commit where I have shuffled around all of the -1 flipping and bumping by two. work still needs to be done because the token stream rewriter stuff fails. and I assume the other decoding for human readability testing if doesn't work * convert decode to use int[]; remove dead code. don't use serializeAsChar stuff. more tests pass. * more tests passing. simplify. When copying atn, must run ATN through serializer to set some state flags. * 0xFFFD+ are not valid char * clean up. tests passing now * huge clean up. Got Java working with 32-bit ATNs!Still working on cleanup but I want to run the tests * Cleanup the hack I did earlier; everything still seems to work * Use linux DCO not our old contributors certificate of origin * remove bump-by-2 code * clean up per @KvanTTT. Can't test locally on this box. Will see what CI says. * tweak comment * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * see if C++ works in CI for huge ATN * Use linux DCO not our old contributors certificate of origin (#3598) * Use linux DCO not our old contributors certificate of origin * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * use linux DCO * use linux DCO * Use linux DCO not our old contributors certificate of origin * update release documentation Signed-off-by: Terence Parr <parrt@antlr.org> * Equivalent of #3537 * clean up setup * clean up doc version * [Swift] improvements to equality functions (#3302) * fix default equality * equality cases * optional unwrapping * [Swift] Use for in loops (#3303) * common for in loops * reversed loop * drop first loop * for in with default BitSet * [Go] Fix symbol collision in generated lexers and parsers (#3603) * [C++] Refactor and optimize SemanticContext (#3594) * [C++] Devirtualize hand rolled RTTI for performance (#3609) * [C++] Add T::is for type hierarchy checks and remove some dynamic_cast (#3612) * [C++] Avoid copying statically generated serialized ATNs (#3613) * [C++] Refactor PredictionContext and yet more performance improvements (#3608) * [C++] Cleanup DFA, DFAState, LexerAction, and yet more performance improvements (#3615) * fix dependabot issues * [Swift] use stdlib (single pass) (#3602) * this was added to the stdlib in Swift 5 * &>> is defined as lhs >> (rhs % lhs.bitwidth) * the stdlib has these * reduce loops * use indices * append(contentsOf:) * Array literal init works for sets too! * inline and remove bit query functions * more optional handling (#3605) * [C++] Minor improvements to PredictionContext (#3616) * use php runtime dev branch to test dev * update doc to be more explicit about the interaction between lexer actions and semantic predicates; Fixes #3611. Fixes #3606. Signed-off-by: Terence Parr <parrt@antlr.org> * Refactor js runtime in preparation of future improvements * refactor, 1 file per class, use import, use module semantics, use webpack 5, use eslint * all tests pass * simplifications and alignment with standard js idioms * simplifications and alignment with standard js idioms * support reading legacy ATN * support both module and non-module imports * fix failing tests * fix failing tests * No longer necessary too generate sets or single atom transit that are bigger than 16bits. (#3620) * Updated getting started with Cpp documentation. (#3628) Included specific examples of using ANTLR4_TAG and ANTLR4_ZIP_REPOSITORY in the sample CMakeLists file. * [C++] Free ATNConfig lookup set in readonly ATNConfigSet (#3630) * [C++] Implement configurable PredictionContextMergeCache (#3627) * Allow to choose to switch off building tests in C++ (#3624) The new option to cmake ANTLR_BUILD_CPP_TESTS is default on (so the behavior is as before), but it provides a way to switch off if not needed. The C++ tests pull in an external dependency (googletests), which might conflict if ANTLR is used as a subproject in another cmake project. Signed-off-by: Henner Zeller <h.zeller@acm.org> * Fix NPE for undefined label, fix #2788 * An interval ought to be a value Interval was a pointer to 2 Ints it ought to be just 2 Ints, which is smaller and more semantically correct, with no need for a cache. However, this technically breaks metadata and AnyObject conformance but people shouldn't be relying on those for an Interval. * [C++] Remove more dynamic_cast usage * [C++] Introduce version macros * add license prefix * Prep 4.10 (#3599) * Tweak doc * Swift was referring to hardcoded version * Start version update script. * add files to update * clean up setup * clean up setup * clean up setup * don't need file * don't need file * Fixes #3600. add instructions and associated code necessary to build the xpath lexers. * clean up version nums * php8 * php8 * php8 * php8 * php8 * php8 * php8 * php8 * tweak doc * ok, i give up. php won't bump up too v8 * tweak doc * version number bumped to 4.10 in runtime. * Change the doc for releasing and update to use latest ST 4.3.2 * fix dart version to 4.10.0 * cmd files Cannot use export bash command. * try fixing php ci again * working on deploy Signed-off-by: Terence Parr <parrt@antlr.org> * php8 always install. * set js to 4.10.0 not 4.10 * turn off apt update for php circleci * try w/o cimg/php * try setting branch * ok i give up * tweak * update docs for release. * php8 circleci * use 3.5.3 antlr * use 3.5.3-SNAPSHOT antlr * use full 3.5.3 antlr * [Swift] reduce Optionals in APIs (#3621) * ParserRuleContext.children see comment in removeLastChild * TokenStream.getText * Parser._parseListeners this might require changes to the code templates? * ATN {various} * make computeReachSet return empty, not nil * overrides refine optionality * BufferedTokenStream getHiddenTokensTo{Left, Right} return empty not nil * Update Swift.stg * avoid breakage by adding overload of `getText` in extension * tweak to kick off build Signed-off-by: Terence Parr <parrt@antlr.org> * try parallelism: 4 circleci * Revert "[Swift] reduce Optionals in APIs (#3621)" This reverts commit b5ccba0. * tweaks to doc * Improve the deploy script and tweak the released doc. * use 4.10 not Snapshot for scripts Co-authored-by: Ivan Kochurkin <kvanttt@gmail.com> Co-authored-by: Alexandr <60813335+Alex-Andrv@users.noreply.github.com> Co-authored-by: 100mango <100mango@users.noreply.github.com> Co-authored-by: Biswapriyo Nath <nathbappai@gmail.com> Co-authored-by: Benjamin Spiegel <bspiegel11@gmail.com> Co-authored-by: Justin King <jcking@google.com> Co-authored-by: Eric Vergnaud <eric.vergnaud@wanadoo.fr> Co-authored-by: Harry Chan <harry.chan@codersatlas.com> Co-authored-by: Ken Domino <kenneth.domino@domemtech.com> Co-authored-by: chenquan <chenquan.dev@gmail.com> Co-authored-by: Marcos Passos <marcospassos@users.noreply.github.com> Co-authored-by: Henner Zeller <h.zeller@acm.org> Co-authored-by: Dante Broggi <34220985+Dante-Broggi@users.noreply.github.com> Co-authored-by: chris-miner <94078897+chris-miner@users.noreply.github.com>

jcking marked this pull request as ready for review February 22, 2022 20:24

jcking force-pushed the cpp-threading-perf branch from 69d82cc to 4f47ce2 Compare February 22, 2022 20:34

parrt added the target:cpp label Feb 22, 2022

parrt requested a review from mike-lischke February 22, 2022 20:52

parrt added the comp:performance label Feb 22, 2022

mike-lischke suggested changes Feb 26, 2022

View reviewed changes

runtime/Cpp/runtime/src/Parser.cpp Show resolved Hide resolved

[C++] Drastically improve multi-threaded performance

359b220

jcking force-pushed the cpp-threading-perf branch from 4f47ce2 to 359b220 Compare February 27, 2022 18:41

jcking requested a review from mike-lischke February 27, 2022 18:41

mike-lischke approved these changes Feb 28, 2022

View reviewed changes

parrt merged commit 1e35007 into antlr:dev Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Drastically improve multi-threaded performance #3550

[C++] Drastically improve multi-threaded performance #3550

jcking commented Feb 22, 2022 •

edited

Loading

jcking commented Feb 22, 2022

jcking commented Feb 22, 2022

parrt commented Feb 24, 2022

mike-lischke commented Feb 25, 2022

jcking commented Feb 25, 2022

parrt commented Feb 25, 2022

mike-lischke left a comment

mike-lischke commented Feb 26, 2022

parrt commented Feb 26, 2022

parrt commented Feb 26, 2022

mike-lischke commented Feb 27, 2022 •

edited

Loading

[C++] Drastically improve multi-threaded performance #3550

[C++] Drastically improve multi-threaded performance #3550

Conversation

jcking commented Feb 22, 2022 • edited Loading

jcking commented Feb 22, 2022

jcking commented Feb 22, 2022

parrt commented Feb 24, 2022

mike-lischke commented Feb 25, 2022

jcking commented Feb 25, 2022

parrt commented Feb 25, 2022

mike-lischke left a comment

Choose a reason for hiding this comment

mike-lischke commented Feb 26, 2022

parrt commented Feb 26, 2022

parrt commented Feb 26, 2022

mike-lischke commented Feb 27, 2022 • edited Loading

jcking commented Feb 22, 2022 •

edited

Loading

mike-lischke commented Feb 27, 2022 •

edited

Loading